Collect spatial data¶

Created by

¶ Li, Chaonan (李超男) / licn@mtc.edu.cn / Ecological Security and Protection Key Laboratory of Sichuan Province, Mianyang Normal University (绵阳师范学院生态安全与保护四川省重点实验室)

¶ Liu Chi (刘驰) / liuchi0426@126.com / College of Resources and Environment, Fujian Agriculture and Forestry University (福建农林大学资源与环境学院)

¶ Liao, Haijun (廖海君) / lihj@mtc.edu.cn / Engineering Research Center of Chuanxibei RHS Construction at Mianyang Normal University of Sichuan Province (绵阳师范学院川西北乡村人居环境建设工程研究中心)

Reviewed by

¶ Li, Xiangzhen (李香真) / lixz@fafu.edu.cn / College of Resources and Environment, Fujian Agriculture and Forestry University (福建农林大学资源与环境学院)

Analyzing the spatial patterns of microbial traits also poses a significant challenge due to the limited availability of environmental data that correspond to microbiome data, and this dearth of paired datasets is particularly prevalent in the majority of currently accessible public repositories. The geodata R package provides some method to download spatial, climate and soil properties from public repositories, and the MODIS hosts many remote-sense images of plant metrics and land cover. To a certain extent, this mitigates the difficult of limited environmental data, though the accuracy of their absolute values is relatively lower compared to those measured ones.

Yet, it is extremely challenging for most R language beginners to effectively integrate these datasets for microbial biogeography analysis, as the whole process involves manipulating diverse datasets. To solve this problem, we have designed a series of functions to download various spatial, climate, vegetation, and soil parameters from public repositories. Although the accuracy of these datasets are lower than that of measured ones, they can still be associated with microbial biogeographical features at a larger spatial scale.

Please note that all spatial data would be resampled to a same spatial resolution based on the first downloaded SpatRaster!

Now, let's go through each of these functions and see how to download environmental properties from public repositories.

Load required R packages¶

In [1]:
# Load three required packages 
suppressMessages(require("magrittr")) 
require("ggplot2")  %>% suppressMessages()
require("microgeo") %>% suppressMessages()

Create a standard microgeo dataset¶

We need a standard microgeo dataset for the presentations in the section of tutorial.

In [2]:
# Example by using the map downloaded from the DataV.GeoAtlas
data(qtp)
map <- read_aliyun_map(adcode = c(540000, 630000, 510000)) %>% suppressMessages() 
dataset.dts.aliyun <- create_dataset(mat = qtp$asv, ant = qtp$tax, met = qtp$met, map = map,
                                     phy = qtp$tre, env = qtp$env, lon = "longitude", lat = "latitude") 
dataset.dts.aliyun %>% show_dataset()
ℹ [2024-01-04 17:10:15.267327] INFO ==> all samples fall within the map area!

ℹ [2024-01-04 17:10:15.276641] INFO ==> dataset has been created successfully!

ℹ [2024-01-04 17:10:15.287247] INFO ==> use `object %>% show_dataset()` to check the summary of dataset.

── The Summary of Microgeo Dataset ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`

Collect the aridity index¶

The get_ai() is implemented to download aridity index from global aridity index database. The resolution of original data is 30''. Here is a simple example.

In [3]:
# Download aridity index for research area 
dataset.dts.aliyun %<>% get_ai(out.dir = "../dev/dat/rundata")
✔ [2024-01-04 17:10:20.571551] SAVE ==> results have been saved to: object$spa$rast$his$AI

In [4]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ object$spa: 1 historically numeric variables; 0 historically classification variables; 0 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [5]:
# Visualize the aridity index
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q1 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$AI, 
                   color = colorRampPalette(RColorBrewer::brewer.pal(11, "RdYlGn"))(100)) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q2 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
   add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$AI, breaks = c(0.03, 0.2, 0.5, 0.65),
                  color = RColorBrewer::brewer.pal(11, "RdYlGn")[c(1,3,5,9,11)], labels = c("HAR", "AR", "SER", "SHR", "HR")) %>% 
   add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q1, q2, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

Collect the elevation¶

The get_elev() is implemented to download elevation data from WorldClim database version 2.1. You are allowed to specify a spatial resolution. Here is a simple example based on the resolution of 2.5'.

In [6]:
# Download elevation for research area 
dataset.dts.aliyun %<>% get_elev(res = 2.5, out.dir = "../dev/dat/rundata")
✔ [2024-01-04 17:10:30.039436] SAVE ==> results have been saved to: object$spa$rast$his$ELEV

In [7]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ object$spa: 2 historically numeric variables; 0 historically classification variables; 0 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [8]:
# Visualize the elevation
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q3 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$ELEV) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q4 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
   add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$ELEV, breaks = c(3000, 4000, 5000, 6000), 
                  labels = c("<3000", "3000-4000", "4000-5000", "5000-6000", ">6000")) %>% 
   add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q3, q4, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

Collect the historical bioclimatic variables¶

The get_his_bioc() is implemented to download 19 historically bioclimatic variables from WorldClim database version 2.1. You are allowed to specify a spatial resolution. Here is a simple example based on the resolution of 2.5'.

In [9]:
# Download 19 historically bioclimatic variables of research area
dataset.dts.aliyun %<>% get_his_bioc(res = 2.5, out.dir = "../dev/dat/rundata")
✔ [2024-01-04 17:10:52.243656] SAVE ==> results have been saved to: object$spa$rast$his(19 variables)

In [10]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ object$spa: 21 historically numeric variables; 0 historically classification variables; 0 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [11]:
# Visualize the Bio1 (Mean annual temperature) 
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q5 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$Bio1) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q6 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
   add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$Bio1, breaks = c(-1, 0, 1), labels = c("A", "B", "C", "D")) %>% 
   add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q5, q6, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image
In [12]:
# Visualize the Bio12 (Mean annual precipitation) 
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q7 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$Bio12) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q8 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
   add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$Bio12, breaks = c(100, 300, 500), labels = c("<100", "100-300", "300-500", ">500")) %>% 
   add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q7, q8, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

Collect the futrue bioclimatic variables¶

The get_fut_bioc() is implemented to download 19 future bioclimatic variables from WorldClim database version 2.1. You are allowed to specify a spatial resolution. Here is a simple example based on the resolution of 2.5'.

In [13]:
# Download futrue bioclimatic variables of research area
dataset.dts.aliyun %<>% get_fut_bioc(res = 2.5, out.dir = "../dev/dat/rundata")
✔ [2024-01-04 17:11:41.465586] SAVE ==> results have been saved to: object$spa$rast$fut [19 variables; 4 groups]

In [14]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ object$spa: 21 historically numeric variables; 0 historically classification variables; 4 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [15]:
# Visualize the Bio1 (Mean annual temperature) of `BCC-CSM2-MR|ssp585|2061-2080` 
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q9 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$fut$`BCC-CSM2-MR|ssp585|2061-2080`$Bio1) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q10 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
   add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$fut$`BCC-CSM2-MR|ssp585|2061-2080`$Bio1, breaks = c(-1, 0, 1), labels = c("A", "B", "C", "D")) %>% 
   add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q9, q10, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image
In [16]:
# Visualize the Bio12 (Mean annual precipitation) of `BCC-CSM2-MR|ssp585|2061-2080` 
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q11 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$fut$`BCC-CSM2-MR|ssp585|2061-2080`$Bio12) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q12 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
   add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$fut$`BCC-CSM2-MR|ssp585|2061-2080`$Bio12, breaks = c(100, 300, 500), labels = c("<100", "100-300", "300-500", ">500")) %>% 
   add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q11, q12, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

Collect the numeric metrics from MODIS¶

The get_modis_num_metrics() is implemented to download numeric MODIS metrics from the EOSDIS. Running show_modis_num_metrics() can display detailed infomation about the avaliable metrics. Regarding to the NDVI and ENV, you can specify a spatial resolution (meter). Here is a simple example.

In [17]:
# Show all avaliable numeric MODIS metrics
show_modis_num_metrics()
A data.frame: 28 × 7
measureresolutiontypenamesdsunitscale.factor
<chr><chr><chr><chr><chr><chr><chr>
1NDVI 250 TerraMOD13Q1 250m 16 days NDVIno unit 1e-04
2NDVI 500 TerraMOD13A1 500m 16 days NDVIno unit 1e-04
3NDVI 1000TerraMOD13A2 1 km 16 days NDVIno unit 1e-04
4NDVI 250 Aqua MYD13Q1 250m 16 days NDVIno unit 1e-04
5NDVI 500 Aqua MYD13A1 500m 16 days NDVIno unit 1e-04
6NDVI 1000Aqua MYD13A2 1 km 16 days NDVIno unit 1e-04
7EVI 250 TerraMOD13Q1 250m 16 days EVI no unit 1e-04
8EVI 500 TerraMOD13A1 500m 16 days EVI no unit 1e-04
9EVI 1000TerraMOD13A2 1 km 16 days EVI no unit 1e-04
10EVI 250 Aqua MYD13Q1 250m 16 days EVI no unit 1e-04
11EVI 500 Aqua MYD13A1 500m 16 days EVI no unit 1e-04
12EVI 1000Aqua MYD13A2 1 km 16 days EVI no unit 1e-04
13GPP 500 TerraMOD17A2HGFGpp_500m kg C/m^2 1e-04
14GPP 500 Aqua MYD17A2H Gpp_500m kg C/m^2 1e-04
15PsnNet500 TerraMOD17A2HGFPsnNet_500m kg C/m^2 1e-04
16PsnNet500 Aqua MYD17A2H PsnNet_500m kg C/m^2 1e-04
17NPP 500 TerraMOD17A3HGFNpp_500m kg C/m^2 1e-04
18NPP 500 Aqua MYD17A3HGFNpp_500m kg C/m^2 1e-04
19LST 1000TerraMOD11A1 LST_Day_1km Kelvin 0.02
20LST 1000Aqua MYD11A1 LST_Day_1km Kelvin 0.02
21ET 500 TerraMOD16A2 ET_500m kg/m²/8day0.1
22ET 500 Aqua MYD16A2 ET_500m kg/m²/8day0.1
23PET 500 TerraMOD16A2 PET_500m kg/m²/8day0.1
24PET 500 Aqua MYD16A2 PET_500m kg/m²/8day0.1
25LE 500 TerraMOD16A2 LE_500m J/m²/day 10000
26LE 500 Aqua MYD16A2 LE_500m J/m²/day 10000
27PLE 500 TerraMOD16A2 PLE_500m J/m²/day 10000
28PLE 500 Aqua MYD16A2 PLE_500m J/m²/day 10000
In [18]:
# Download numeric metrics from MODIS
# Please replace <username> and <passwd> with real value
dataset.dts.aliyun %<>% get_modis_num_metrics(username = "username", password = "passwd", date.ran = c("2019-08-01|2019-09-01", "2020-08-01|2020-09-01"),
                                              measures = c("NDVI", "EVI"), out.dir = "../dev/dat/rundata")
ℹ [2024-01-04 17:11:55.234416] INFO ==> preparing MODIS product list for searching...

ℹ [2024-01-04 17:11:55.243476] INFO ==> searching avaliable MODIS products...

ℹ [2024-01-04 17:11:55.25067] INFO ==> current product (1/2): MOD13A2 (NDVI|EVI--> 2019-08-01 to 2019-09-01)

ℹ [2024-01-04 17:11:57.686182] INFO ==> current product (2/2): MOD13A2 (NDVI|EVI--> 2020-08-01 to 2020-09-01)

ℹ [2024-01-04 17:12:01.529056] INFO ==> find 48 files with 0.97 GB in total...

ℹ [2024-01-04 17:12:01.546177] INFO ==> downloading all avaliable MODIS products[skip if the file exists]...

ℹ [2024-01-04 17:12:01.561643] INFO ==> preparing the PTVs (Product, Time, Version) for merging remote-sensing images...

ℹ [2024-01-04 17:12:01.653311] INFO ==> converting hdf files to tif files...

ℹ [2024-01-04 17:12:01.662253] INFO ==> current product (1/1): MOD13A2 (convert 48 hdf files into 12 tif files using 12 threads)

ℹ [2024-01-04 17:12:01.932387] INFO ==> calculating average values based on date range...

ℹ [2024-01-04 17:12:01.948657] INFO ==> current measure (1/2): NDVI_061(30 threads)

ℹ [2024-01-04 17:12:01.985294] INFO ==> current measure (2/2): EVI_061(30 threads)

ℹ [2024-01-04 17:12:02.00275] INFO ==> reprojecting the CRS of SpatRaster to epsg:+proj=longlat +datum=WGS84 +no_defs, it takes a while...

✔ [2024-01-04 17:12:15.695229] SAVE ==> results have been saved to: object$spa$rast$his$NDVI

ℹ [2024-01-04 17:12:15.721029] INFO ==> reprojecting the CRS of SpatRaster to epsg:+proj=longlat +datum=WGS84 +no_defs, it takes a while...

✔ [2024-01-04 17:12:29.558543] SAVE ==> results have been saved to: object$spa$rast$his$EVI

In [19]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ object$spa: 23 historically numeric variables; 0 historically classification variables; 4 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [20]:
# Visualize the NDVI and EVI 
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q13 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$NDVI) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q14 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$EVI) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q13, q14, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

Collect the classification metrics from MODIS¶

The get_modis_cla_metrics() is used to download classification MODIS metrics from the EOSDIS. Running show_modis_cla_metrics() can display detailed infomation about the avaliable metrics. Here is a simple example.

In [21]:
# Show all avaliable classification metrics
show_modis_cla_metrics()
A data.frame: 11 × 7
measureresolutiontypenamesdsunitscale.factor
<chr><chr><chr><chr><chr><chr><chr>
1LC_Type1 ModerateCombinedMCD12Q1LC_Type1 no unitNA
2LC_Type2 ModerateCombinedMCD12Q1LC_Type2 no unitNA
3LC_Type3 ModerateCombinedMCD12Q1LC_Type3 no unitNA
4LC_Type4 ModerateCombinedMCD12Q1LC_Type4 no unitNA
5LC_Type5 ModerateCombinedMCD12Q1LC_Type5 no unitNA
6LC_Prop1 ModerateCombinedMCD12Q1LC_Prop1 no unitNA
7LC_Prop2 ModerateCombinedMCD12Q1LC_Prop2 no unitNA
8LC_Prop3 ModerateCombinedMCD12Q1LC_Prop3 no unitNA
9LC_Prop1_AssessmentModerateCombinedMCD12Q1LC_Prop1_Assessmentno unitNA
10LC_Prop2_AssessmentModerateCombinedMCD12Q1LC_Prop2_Assessmentno unitNA
11LC_Prop3_AssessmentModerateCombinedMCD12Q1LC_Prop3_Assessmentno unitNA
In [22]:
# Download classification metrics from MODIS
# Please replace <username> and <passwd> with real value
dataset.dts.aliyun %<>% get_modis_cla_metrics(username = "username", password = "passwd", 
                                              measures = "LC_Type1", out.dir = "../dev/dat/rundata")
ℹ [2024-01-04 17:12:36.405507] INFO ==> preparing MODIS product list for searching...

ℹ [2024-01-04 17:12:36.415375] INFO ==> searching avaliable MODIS products...

ℹ [2024-01-04 17:12:36.424059] INFO ==> current product (1/1): MCD12Q1 (LC_Type1--> 2022-01-01 to 2022-12-31)

ℹ [2024-01-04 17:12:37.297119] INFO ==> find 8 files with 0.09 GB in total...

ℹ [2024-01-04 17:12:37.306967] INFO ==> downloading all avaliable MODIS products[skip if the file exists]...

ℹ [2024-01-04 17:12:37.317382] INFO ==> preparing the PTVs (Product, Time, Version) for merging remote-sensing images...

ℹ [2024-01-04 17:12:37.335075] INFO ==> converting hdf files to tif files...

ℹ [2024-01-04 17:12:37.346099] INFO ==> current product (1/1): MCD12Q1 (convert 8 hdf files into 1 tif files using 1 threads)

ℹ [2024-01-04 17:12:37.576444] INFO ==> collecting all merged image files...

ℹ [2024-01-04 17:12:37.590373] INFO ==> current measure (1/1): LC_Type1_061

ℹ [2024-01-04 17:12:37.638847] INFO ==> reprojecting the CRS of SpatRaster to epsg:+proj=longlat +datum=WGS84 +no_defs, it takes a while...

✔ [2024-01-04 17:13:07.54895] SAVE ==> results have been saved to: object$spa$rast$cla$LC_Type1

In [23]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ object$spa: 23 historically numeric variables; 1 historically classification variables; 4 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [24]:
# We need to subset classification data before the visualization. Otherwise, it may take a very long time.
# Due to the high data resolution, it is not recommended to use `add_crs()` for visualization here. Otherwise, it may take a very long time.
options(repr.plot.width = 16.4, repr.plot.height = 8.02)
g.spat.raster <- subset_cla_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$cla$LC_Type1, use.class = c(10, 16)) # Only display grassland and barren
plot_bmap(map = dataset.dts.aliyun$map) %>% add_spatraster(spat.raster = g.spat.raster, color = c("darkgreen", "brown"))
No description has been provided for this image

Collect the world soil properties¶

The get_soilgrid() is implemented to download soil metrics from SoilGRIDS In the current version, there are 8 avaliable metrics. Here is a simple example.

In [25]:
# Download soil metrics from the SoilGRIDS for the research region
dataset.dts.aliyun %<>% get_soilgrid(depth = 5, measures = c("phh2o", "soc"), out.dir = "../dev/dat/rundata")
✔ [2024-01-04 17:13:23.655202] SAVE ==> results have been saved to: object$spa$rast$his$phh2o

✔ [2024-01-04 17:13:31.002808] SAVE ==> results have been saved to: object$spa$rast$his$soc

In [26]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ object$spa: 25 historically numeric variables; 1 historically classification variables; 4 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [27]:
# Visualize the phh2o and soc
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q15 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$phh2o) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q16 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$soc) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q15, q16, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

Collect the soil properties of China¶

This function of get_soilcn() is used to process soil metrics downloaded from here. Due to the limitations in copyrights, the spatial data of China soil properties should be manually downloaded. Here is a simple example.

In [28]:
# Process soil metrics of CHINA for a research area
# You should download the .nc files, and then place theme into `../dev/dat/rundata/soilchina_products` before run `get_soilcn()`
dataset.dts.aliyun %<>% get_soilcn(depth = 0.045, measures = c("PH", "SOM"), out.dir = "../dev/dat/rundata")
✔ [2024-01-04 17:13:41.836801] SAVE ==> results have been saved to: object$spa$rast$his$PH

✔ [2024-01-04 17:13:49.22903] SAVE ==> results have been saved to: object$spa$rast$his$SOM

In [29]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ object$spa: 27 historically numeric variables; 1 historically classification variables; 4 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [30]:
# Visualize the PH and SOM 
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q17 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$PH) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q18 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$SOM) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q17, q18, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

Extract spatial data for each sample¶

After all spatial data have been successfully downloaded, we can use the extract_data_from_spatraster() function to extract these spatial data based on the latitude and longitude of the samples. Here is a simple example.

In [31]:
# Show spatial variable names before the extraction 
dataset.dts.aliyun %<>% get_spa_vars()
dataset.dts.aliyun$spa$unit
A data.frame: 28 × 2
measureunit
<chr><chr>
1ELEV m
2AI
3Bio1 degree centigrade
4Bio2 degree centigrade
5Bio3 degree centigrade
6Bio4 degree centigrade
7Bio5 degree centigrade
8Bio6 degree centigrade
9Bio7 degree centigrade
10Bio8 degree centigrade
11Bio9 degree centigrade
12Bio10 degree centigrade
13Bio11 degree centigrade
14Bio12 mm
15Bio13 mm
16Bio14 mm
17Bio15 mm
18Bio16 mm
19Bio17 mm
20Bio18 mm
21Bio19 mm
22NDVI
23EVI
24PH
25SOM g/100g
26phh2o
27soc g kg^-1
28LC_Type1
In [32]:
# Extract spatial data for samples
# A `data.frame` of historically spatial variables for each sample is avaliable at `dataset.dts.aliyun$spa$tabs`
dataset.dts.aliyun %<>% extract_data_from_spatraster() %>% suppressWarnings() 
head(dataset.dts.aliyun$spa$tabs)
! [2024-01-04 17:13:55.036938] WARN ==> Some samples were failed to be applied for extraction. use `remove.na = FALSE` to check them!

✔ [2024-01-04 17:13:55.048203] SAVE ==> results have been saved to: object$spa$tabs

A data.frame: 6 × 28
LC_Type1AIELEVBio1Bio2Bio3Bio4Bio5Bio6Bio7⋯Bio16Bio17Bio18Bio19NDVIEVIphh2osocPHSOM
<fct><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>⋯<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
s1100.47634077.8 0.0524333814.959841.23052776.270815.9224-20.360836.2832⋯298.414298.415.40.70961710.50454367.058.27.7817.354
s2100.47634077.8 0.0524333814.959841.23052776.270815.9224-20.360836.2832⋯298.414298.415.40.70961710.50454367.058.27.7817.354
s3100.47634077.8 0.0524333814.959841.23052776.270815.9224-20.360836.2832⋯298.414298.415.40.70961710.50454367.058.27.7817.354
s4100.47634077.8 0.0524333814.959841.23052776.270815.9224-20.360836.2832⋯298.414298.415.40.70961710.50454367.058.27.7817.354
s5100.47634077.8 0.0524333814.959841.23052776.270815.9224-20.360836.2832⋯298.414298.415.40.70961710.50454367.058.27.7817.354
s6100.48744103.2-0.0442666614.993241.31165775.450615.8096-20.483236.2928⋯299.614299.615.60.73031100.52891116.963.57.7817.354
In [33]:
# Check sample numbers 
dim(dataset.dts.aliyun$spa$tabs)
  1. 1095
  2. 28
In [34]:
# Finally, we tidy up the dataset again 
dataset.dts.aliyun %<>% rarefy_count_table()
dataset.dts.aliyun %<>% tidy_dataset()
dataset.dts.aliyun %>% show_dataset()
ℹ [2024-01-04 17:13:57.773302] INFO ==> the ASV/gene abundance table has been rarefied with a sub-sample depth of 5310

── The Summary of Microgeo Dataset ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
ℹ object$mat: 6807 ASVs/genes and 1095 samples [subsample depth: 5310]

ℹ object$ant: 6807 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1095 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6807 tip labels

ℹ object$env: 1095 samples and 10 variables

── The Summary of Biogeographic Traits ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ object$spa: 27 historically numeric variables; 1 historically classification variables; 4 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`